Tools for 3D-object retrieval: Karhunen-Loeve transform and spherical harmonics

نویسندگان

  • Dejan V. Vranic
  • Dietmar Saupe
  • Jörg Richter
چکیده

We present tools for 3D object retrieval in which a model, a polygonal mesh, serves as a query and similar objects are retrieved from a collection of 3D objects. Algorithms proceed first by a normalization step (pose estimation) in which models are transformed into a canonical coordinate frame. Second, feature vectors are extracted and compared with those derived from normalized models in the search space. Using a metric in the feature vector space nearest neighbors are computed and ranked. Objects thus retrieved are displayed for inspection, selection, and processing. For the pose estimation we introduce a modified Karhunen-Loeve transform that takes into account not only vertices or polygon centroids from the 3D models but all points in the polygons of the objects. Some feature vectors can be regarded as samples of functions on the 2-sphere. We use Fourier expansions of these functions as uniform representations allowing embedded multi-resolution feature vectors. Our implementation demonstrates and visualizes these tools. INTRODUCTION AND PREVIOUS WORK Objects in databases traditionally have been accessed using attached information such as textual annotation. Recently, methods for retrieving multimedia documents using audio-visual content as a key are developed and standardized in MPEG-7 [3]. Many similarity-based retrieval systems were designed for still image, audio and video, while only a few techniques for content-based 3D model retrieval have been reported [2, 3, 4, 5, 6, 7]. In this paper we discuss two tools for 3D object retrieval in which a 3D model given as a triangle mesh serves as a query key and similar objects are retrieved from a collection of 3D objects. Content-based 3D model retrieval algorithms typically proceed in three steps: 1. Normalization (pose estimation). 3D models are given in arbitrary units of measurement and in unpredictable positions and orientations in 3D-space. The normalization step transforms model into a canonical coordinate frame. The goal of this procedure is that if one chose a different scale, position, rotation, or orientation of an original model, then the representation in the canonical coordinate frame would still be the same. Moreover, since objects may have different levels-of-detail (e.g., after a mesh simplification to reduce the number of polygons), their normalized representations should be the same as much as possible. The normalization step ensures that models can be retrieved regardless of the choices their authors have made for their mesh representation. 2. Feature extraction. The features capture the 3D shape of the objects. Proposed features range from simple bounding box parameters [5] to complex image-based representations [2]. Usually, the features are stored as vectors with real-valued components and fixed dimension. There is a tradeoff between the required storage, computational complexity, and the resulting retrieval performance. 3. Similarity search. The features are designed so that similar 3D-objects are attributed vectors that are close in feature vector space. Using a suitable metric nearest neighbors are computed and ranked. A variable number of objects are thus retrieved by listing the top ranking items. There have been several approaches for the normalization step, the most prominent one being the princple component analysis (PCA) that produces an affine transformation of space, that is also known as the Karhunen-Loeve or Hotelling transform. The transform is defined by a set of vectors, e.g., the set of vertices of a 3D model. After a translation of the set moving its center of mass to the origin of the coordinate system a rotation is applied so that the largest spread of the transformed points (the variance) is along the x-axis. Then a rotation around the x-axis is carried out so that the maximal spread in the yz-plane occurs along the y-axis. Finally, the object is scaled to a certain unit size. Essentially, that is the approach taken in [3]. A serious problem is that differing sizes of triangles are not taken into account which may cause widely varying normalized coordinate frames for models that are identical except for finer triangle resolution in some parts of the model. As a solution to this issue we introduced appropriately choosen vertex weights for the PCA [7], while Paquet et al. [5] used centers of gravity of triangles as vectors for the PCA with weights proportional to triangle areas. Such methods improve retrieval results, see Figure 2. The shape descriptor in [6] is invariant only with respect to rotations of 90 degrees around coordinate axes. The invariance was attained using a well known general principle. Any feature vector can be made invariant with respect to a finite group of transformations of space by summing or averaging feature vectors computed from all possible transformations of an object. In [4] the pose estimation is based on moments of solid objects. Howewer, 3D models are not guaranteed to consist of closed surfaces bounding one or more solids, and it would be a difficult and questionable undertaking to enforce objects to be solids by stitching up surfaces with boundaries. Therefore, the approach is suitable only for a small class of 3D models. In this paper we propose two tools for Steps 1 and 2 of any algorithm following the general layout. For the pose estimation previous we generalize the Karhunen-Loeve transform so that all of the (infinitely many) points in the polygons of an object are equally relevant for the transformation. For the feature vectors we notice that a class of them can be regarded as taking samples of functions on the 2-sphere. Using Fourier expansions of these functions provides a new uniform approach which facilitates embedded multi-resolution feature vectors. Our implementation demonstrates and visualizes these tools. CANONICAL COORDINATE FRAME In this section we outline the details for our continuous PCA and the associated Karhunen-Loeve transform. We regard a given triangle mesh as consisting of a set of triangles T = {T1, . . . , Tm}, Ti ⊂ R, given by a set of vertices (geometry) P = {p1, . . . ,pn},pi = (xi, yi, zi) ∈ R, and a table with a list of indices of three vertices for each triangle (topology). Then I = ⋃m i=1 Ti is the point set of all triangles, i.e., our given object. Our goal is to derive an affine map τ : R → R in such way that for an arbitrary concatenation σ of translations, rotations, reflections, and scaling the desired invariance property of τ , namely τ(I) = τ(σ(I)) holds where we have set σ(I) := {σ(v)|v ∈ I} and similarly for τ . Let Si be the area of triangle Ti, i = 1, ..., m. For simplicity of notation we may assume that the triangles intersect only on subsets of measure zero so that we may write the overall surface in the model as S := S1 + . . . + Sm = ∫ I dv. The translation invariance is accomplished by translating the center of gravity of a model, c, to the origin, i.e., by forming the point set I1 := I−c = {u | u = v−c, v ∈ I}. To secure the rotation invariance we apply the PCA on the set I1. First, we calculate the covariance 3×3-matrix M = 1 S ∫ I1 v ·vT dv. The remaining part of this step follows the standard PCA. Since the matrix M is a symmetric real matrix its eigenvalues are real and the eigenvectors orthogonal. We calculate the eigenvalues of M , sort them in decreasing order, compute the corresponding eigenvectors and scale them to Euclidean unit length. We form the rotation matrix R, which has the scaled eigenvectors as rows. Afterwards, we rotate the set I1 and obtain a new point set I2 = R·I1 = {v | v = R·u, u ∈ I1}. To ensure the reflection invariance we multiply points in I2 by a diagonal matrix F = diag(sign(fx), sign(fy), sign(fz)), where fx = 1 S ∫ I2 sign(vx)v xdv, (fy, fz similar), and v = (vx, vy, vz) ∈ I2. Scaling invariance is achieved by scaling the set I2 by the inverse of s = [(sx + sy + s 2 z)/3] , where sx, sy, and sz denote the average distances of points v ∈ I2 from the yz-, xz-, and xy-coordinate hyperplanes, respectively, i.e., sx = 1 S ∫ I2 |vx|dv and likewise for sy, sz. Putting all the above together, the affine map τ , defined by τ(v) = s−1 · F · R · (v − c) is applied to all points of the original object I. In practice, it suffices to transform only the set of vertices P . In contrast to the usual application of the PCA we work with sums of integrals over triangles in place of sums over vertices which makes our approach more complete taking into account all points of the model I with equal weight. The calculation of the integrals is only slightly more expensive. Due to space restrictions we omitt the formulas which can easily be derived. SPHERICAL HARMONIC REPRESENTATION Some feature vectors can be considered as samples of a function on the sphere S. For example, for a (normalized) model I define r : S → R u → max{r ≥ 0 | ru ∈ I ∪ {0}} where 0 is the origin. This function r(u) measures the extent of the object in directions given by u ∈ S, compare [7]. Similarly, one may consider a rendered perspective projection of the object on an enclosing sphere as another example (compare [2]). If we can characterize such maps with a small number of parameters then these can be regarded as good candidates for feature vectors in 3D object retrieval. The Fourier transform on the sphere provides a suitable approach that uses the spherical harmonic functions Y m l to represent any spherical function r ∈ L(S) as r = l≥0 ∑ |m|≤l r̂(l, m)Y m l . Here r̂(l, m) denotes a Fourier coefficient and the spherical harmonic basis functions are certain products of Legendre functions and complex exponentials. The (complex) Fourier coefficients can be efficiently computed by a spherical FFT algorithm applied to samples taken at points uij = (cos φi cos 2φj , cos φi sin 2φj , sinφi), where φk = (2k + 1 − n)π/2n, k = 0, . . . , n−1 and i, j = 0, . . . , n−1. We cannot give more details here and refer to the survey and software in [1]. A example output of the absolute values of the spherical Fourier coefficients (up to l = 3) is given here: 0.37 0.020 0.052 0.020 0.068 0.012 0.012 0.012 0.068 0.0052 0.0025 0.0032 0.0026 0.0032 0.0025 0.0052 Feature vectors can be extracted from the first l rows of coefficients. This implies that such a feature vector contains all feature vectors of the same type of smaller dimension, thereby providing a novel embedded multi-resolution approach for 3D shape feature vectors, see also Figure 1.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spherical Harmonics and Distance Transform for Image Representation and Retrieval

In this paper, we have proposed a method for 2D image retrieval based on object shapes. The method relies on transforming the 2D images into 3D space based on distance transform. Spherical harmonics are obtained for the 3D data and used as descriptors for the underlying 2D images. The proposed method is compared against two existing methods which use spherical harmonics for shape based retrieva...

متن کامل

3D Models Recognition in Fourier Domain Using Compression of the Spherical Mesh up to the Models Surface

Representing 3D models in diverse fields have automatically paved the way of storing, indexing, classifying, and retrieving 3D objects. Classification and retrieval of 3D models demand that the 3D models represent in a way to capture the local and global shape specifications of the object. This requires establishing a 3D descriptor or signature that summarizes the pivotal shape properties of th...

متن کامل

Discriminative Spherical Wavelet Features for Content-Based 3D Model Retrieval

The description of 3D shapes using features that possess descriptive power and are invariant under similarity transformations is one of the most challenging issues in contentbased 3D model retrieval. Spherical harmonics-based descriptors have been proposed for obtaining rotation invariant representations. However, spherical harmonic analysis is based on a latitude-longitude parameterization of ...

متن کامل

Use of Fourier and Karhunen-Loeve Decomposition for Fast Pattern Matching With a Large Set of Templates

We present a fast pattern matching algorithm with a large set of templates. The algorithm is based on the typical template matching speeded up by the dual decomposition; the Fourier transform and the Karhunen-Loeve transform. The proposed algorithm is appropriate for the search of an object with unknown distortion within a short period. Patterns with different distortion differ slightly from ea...

متن کامل

Compression of image clusters using Karhunen Loeve transformations

This paper proposes to extend the Karhunen-Loeve compression algorithm to multiple images. The resulting algorithm is compared against single-image Karhunen Loeve as well as algorithms based on the Discrete Cosine Transformation (DCT). Futhermore, various methods for obtaining compressable clusters from large image databases are evaluated.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001